Window Subsequence Problems for Compressed Texts

نویسندگان

  • Patrick Cégielski
  • Irène Guessarian
  • Yury Lifshits
  • Yuri V. Matiyasevich
چکیده

Given two strings (a text t of length n and a pattern p) and a natural number w, window subsequence problems consist in deciding whether p occurs as a subsequence of t and/or finding the number of size (at most) w windows of text t which contain pattern p as a subsequence, i.e. the letters of pattern p occur in the text window, in the same order as in p, but not necessarily consecutively (they may be interleaved with other letters). We are searching for subsequences in a text which is compressed using Lempel-Ziv-like compression algorithms, without decompressing the text, and we would like our algorithms to be almost optimal, in the sense that they run in time O(m) where m is the size of the compressed text. The pattern is uncompressed (because the compression algorithms are evolutive: various occurrences of a same pattern look different in the text).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Faster Subsequence and Don't-Care Pattern Matching on Compressed Texts

Subsequence pattern matching problems on compressed text were first considered by Cégielski et al. (Window Subsequence Problems for Compressed Texts, Proc. CSR 2006, LNCS 3967, pp. 127–136), where the principal problem is: given a string T represented as a straight line program (SLP) T of size n, a string P of size m, compute the number of minimal subsequence occurrences of P in T . We present ...

متن کامل

On the Computational Complexity of Embedding of Compressed Texts

In this work we consider a well-known problem of processing of compressed texts. We study the following question (called Embedding): whether one compressed text is a subsequence of another compressed text? In this paper we show that Embedding is NPand co-NP-hard.

متن کامل

Faster subsequence recognition in compressed strings

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel–Ziv compression. For an SLPcompressed text of length m̄, and an uncompressed pattern of length n, Cégielski et al. gave an algorithm for local subsequence recogn...

متن کامل

Processing Compressed Texts: A Tractability Border

What kind of operations can we perform effectively (without full unpacking) with compressed texts? In this paper we consider three fundamental problems: (1) check the equality of two compressed texts, (2) check whether one compressed text is a substring of another compressed text, and (3) compute the number of different symbols (Hamming distance) between two compressed texts of the same length....

متن کامل

Online construction of subsequence automata for multiple texts by Hiromasa Hoshino

We consider a deterministic finite automaton which accepts all subsequences of a set of texts, called subsequence automaton. We show an online algorithm for constructing subsequence automaton for a set of texts. It runs in O(|Σ|(m + k) + N) time using O(|Σ|m) space, where |Σ| is the size of alphabet, m is the size of the resulting subsequence automaton, k is the number of texts, N is the total ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006